Contextual Bandits: Approximated Linear Bayes for Large Contexts

نویسنده

  • Jose Antonio Martin
چکیده

Contextual bandits, and in general informed decision making, can be studied in the general stochastic/statistical setting by means of the conditional probability paradigm where Bayes’ theorem plays a central role. However, when informed decisions have to be made considering very large contextual information or the information is contained in too many variables with large history of observations and the time to take decisions is critical, the exact calculation of the Bayes’ rule, to produce the best decision given the available information, is unaffordable. In this increasingly common setting some derivations and approximations to conditional probability and the Bayes’ rule will progressively gain greater applicability. In this article, an algorithm able to handle large contextual information in the form of binary features for optimal decision making in contextual bandits is presented. The algorithm is analyzed with respect to its scalability in terms of the time required to select the best choice and the time required to update its policy. Last but not least, we address the exploration and exploitation issue explaining, despite the incomputability of an optimal tradeoff, the way in which the proposed algorithm “naturally” balances exploration and exploitation by using common sense.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Bayes policy for learning in contextual-bandits

Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e. bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that de...

متن کامل

Finite-Time Analysis of Kernelised Contextual Bandits

We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions’ contexts and that the expected reward is an arbitrary linear function of the contexts’ images in the related repro...

متن کامل

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the stateof-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we d...

متن کامل

PAC-Bayesian Analysis of Contextual Bandits

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) N goes as

متن کامل

Linear Contextual Bandits with Knapsacks

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn’t exceed the budget for each ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012